AITopics | robustly binarized multi-distilled transformer

BiT: Robustly Binarized Multi-distilled Transformer

Neural Information Processing SystemsDec-24-2025, 06:52:04 GMT

Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments. Binarization of the weights and activations of the network can significantly alleviate these issues, however, is technically challenging from an optimization perspective. In this work, we identify a series of improvements that enables binary transformers at a much higher accuracy than what was possible previously. These include a two-set binarization scheme, a novel elastic binary activation function with learned parameters, and a method to quantize a network to its limit by successively distilling higher precision models into lower precision students. These approaches allow for the first time, fully binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline on the GLUE language understanding benchmark within as little as 5.9%.

electronic proceedings, name change, robustly binarized multi-distilled transformer, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

BiT: Robustly Binarized Multi-distilled Transformer

Neural Information Processing SystemsAug-15-2025, 03:07:39 GMT

Inspired by the learnable bias proposed in ReActNet (Liu et al., 2020), we further propose elastic In contrast to Bi-Attention proposed in BiBERT (Qin et al., 2021) that removes We conduct meticulous experiments to compare these choices. The binary convolution between the weights and activations that are both binarized to {-1, 1} (i.e. The GLUE benchmark (Wang et al., 2019) includes the following datasets: MNLI Multi-Genre Natural Language Inference is an entailment classification task (Williams et al., QQP Quora Question Pairs is a paraphrase detection task. QNLI Question Natural Language Inference (Wang et al., 2019) is a binary classification task STS-B The Semantic Textual Similarity Benchmark is a sentence pair classification task. The sentence pairs are sourced from online news sources (Dolan & Brockett, 2005).

activation, distillation, robustly binarized multi-distilled transformer, (13 more...)

Neural Information Processing Systems

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

BiT: Robustly Binarized Multi-distilled Transformer

Neural Information Processing SystemsOct-11-2024, 05:27:39 GMT

Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments. Binarization of the weights and activations of the network can significantly alleviate these issues, however, is technically challenging from an optimization perspective. In this work, we identify a series of improvements that enables binary transformers at a much higher accuracy than what was possible previously. These include a two-set binarization scheme, a novel elastic binary activation function with learned parameters, and a method to quantize a network to its limit by successively distilling higher precision models into lower precision students. These approaches allow for the first time, fully binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline on the GLUE language understanding benchmark within as little as 5.9%.

accuracy, robustly binarized multi-distilled transformer

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

Filters

Collaborating Authors

robustly binarized multi-distilled transformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

BiT: Robustly Binarized Multi-distilled Transformer

BiT: Robustly Binarized Multi-distilled Transformer

BiT: Robustly Binarized Multi-distilled Transformer